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Accompanying  Note 

An  approximation  technique  for  small 
noise  open  loop  control  problems 

This  paper  is  concerned  with  the  development  of  an 
approximation  technique  for  the  solution  of  a class  of  fixed 

u 

stopping  time  small  noise  open  loop  control  problems.  These 
problems  arise  by  adding  an  additive  white  noise  term  with  a 

it 

small  coefficient  (2c) ^1  to  the  system  equations  in  the 
deterministic  control  problem. 

An  approximation  scheme  is  developed  that  has  the  advantage 
that  one  finds  approximately  optimal  controls  simultaneously 
for  all  sufficiently  small  e.  The  scheme  requires  the  solution 
of  a generalized  linear  regulator  problem  which  is  solvable 
easily  numerically.  The  numerical  method  is  given  and  an 
example  illustrating  the  efficiency  of  the  method  is  also 
presented . 
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1.  Introduction 


This  paper  is  concerned  with  the  development  of  an  approxi- 
mation technique  for  the  solution  of  a class  of  fixed  stopping 
tine  snail  noise  open  loop  control  problems.  These  problems 
arise  by  adding  an  additive  white  noise  term  with  a small 
coefficient  (2c)  '‘I  to  the  system  equations  in  the  deterministic 
control  problem. 

00 

In  earlier  work  [4]  we  derived  expansions  of  class  C 
in  e of  the  optimal  open_loop  cost  and  control  for  a very 
special  class  of  problems  in  which  each  open  loop  generated 
a nonaeyenerate  Gaussian  process.  This  property  allowed  the 
conversion  of  the  stochastic  control  problem  into  an  equivalent 
deterministic  control  problem.  Under  less  restrictive  assumptions 
in  [ ] we  were  able  to  derive  a truncated  expansion  of  the 
optimal  cost,  but  were  unable  to  theoretically  estaolish  an 
expansion  of  the  optimal  cost. 

Motivated  by  these  previous  results,  we  consider  more 
general  open  loop  control  problems  in  which  each  open  loop 
control  does  not  necessarily  generate  a Gaussian  process  and 
attempt  to  find  "best"  controls  of  the  form  U°+  eV.  Here  the 
function  U°  denotes  the  optimal  open  loop  deterministic  control. 
This  approximation  scheme  has  the  advantage  that  one  finds 
approximately  optimal  controls  simultaneously  for  all  sufficiently 
small  c.  This  scheme  leads  to  the  selection  of  a control 
u°  + tV  which  performs  better  (or  at  least  as  well)  than  U° 


in  the  e problem  for  all  sufficiently  small  e.  The  approxi- 


mation  technique  for  the  calculation  of  V leads  to  a generalized 
linear  regulator  problem  which  can  be  solved  easily  numerically. 
This  scheme  is  superior  to  and  does  not  agree  with  the  standard 
secondary  extremal  problem  as  is  shown  in  §4. 

Other  work  on  small  noise  problems  includes  the  completely 
observable  work  of  Fleming  [l].  Other  approaches  to  open 
loop  control  problems  include  Ilortensen  [ 5 ] and  VanSlyke  and 
Wets  [ 6 ] • 

2.  The  problem.  Suppose  that  the  state  £ (t)  evolves  according 
to  the  stochastic  differential  equations 

(1)  d £ = f (t,£(t)  ,U(t)  )dt  + (2e)‘5Idw(t) 


where  w is  n dimensional  Brownian  motion,  and  with  initial 
condition  C (s^)  = xQ,  a constant  in  Kn.  In  (1)  U is  a 
control  with  values  in  the  control  set  K = R . We  seek  to 
minimize 


(2) 


J(U)  = E{ 


T 


L(t,£(t)  ,U(t)  )dt|USn)  = x } 


over  the  class  of  open  loop  controls  An  open  loop  control 

U t is  a liorel  measurable  function  on  [Sq,T]  with  values  in 

K. 

Let  0 = [sq/T]  x Rn.  Throughout  we  assume  the  following: 


(i)  The  initial  point  (sQ,x0)  is  a fixed  constant  in 
Rn+1,  and  is  known  to  the  controller.  There  exists  a unique 
optimal  open  loop  control  U°  for  the  deterministic  control 


T 


problem  (1) , (2)  with  e = 0. 

(ii)  f(t,x,u)  = A(t ,x)  + B(t)u  with  A,  B smooth 
functions . 

(iii)  L is  a smooth  function  and  there  exists  CQ  > 0 

m 2 

such  that  v L (s,x,u)v  C | v | for  all  (s,x,u) . 

Concerning  (ii)  , see  the  remarks  in  §4. 

• £ 

The  determination  of  the  optimal  control  U tor  the 
c.  problem,  even  numerically,  is  impossible  in  yeneral  and  one 
seeks  approximations  to  Uc.  We  propose  here  such  a scheme. 

Let  U°  denote  tiie  optimal  deterministic  open  loop  control 
corresponding  to  starting  at  (s^,Xq).  We  seek  a "best"  approxi- 
mation scheme  of  the  form  V£  = U°  + eV. 

Let  JL  denote  the  cost  function  in  (2)  when  e = e is 


useu  in  (1) . Then  we  have  the  following  result  whose  proof  is 
contained  in  [ 2 ind  follows  the  method  of  §4  in  [3  ]. 


Theorem  1 . For  each  Holder  continuous  function  V, 


JC(VC)  = J°(U°)  + ex  + e2T(V)  + o(t2) 


where  x is  independent  of  V anu  T (V)  is  given  by 

fv  o o 

I'(V)  = [4>  (t,£° (t)  ,v)B(t)V(t)  + A*(t,5°(t)  ,v) 

Js  x * 

(3)  0 

+ J v'A’(t)Luu(t,£;0(t)  ,U°(t)  )V(t)  ]dt. 

Here  £°(t)  is  the  optimal  trajectory  for  the  open  loop  de- 
terministic control  problem  with  initial  condition  £ (s^)  = Xq 


and  v(s,x,V)  satisfies 


Vs'; 


(4) 


+ {L  (s , x ,U°  (s)  ) + u£(s,x)B(t)  }V(s)  = 0 

U A 

on  [ Sq , T]  x Kn  with  terminal  condition  $(T,x)  = 0.  The 
function  ^°(t,x)  satisfies 

(5)  Ax^°  + *°f  (t,x,U°(t)  ) + + L ( t ,x  ,U°  ( t ) ) = 0 

with  terminal  condition  ij;0(T,x)  = 0. 


remark . d°(t,x)  is  the  cost  of  starting  at  (t,x),  t _>  Sq, 

and  using  tlie  open  loop  control  U°  corresponding  to  the  initial 
point  (Sq,Xq).  Note  that  the  notational  dependence  of  4>  on 
V only  indicates  that  for  a fixed  function  V,  $ satisfies  a 
linear  partial  differential  equation  depending  upon  V. 

dince  x is  independent  of  the  choice  of  the  Holder 
continuous  function  V,  let  us  attempt  to  choose  V so  as  to 
minimize  the  quantity  T (V) . This  will  be  considered  the  "best" 
approximate  control. 


3.  dolution  of  the  T (V)  control  problem. 

The  minimization  of  T (V)  can  be  formulated  as  a deterministic 
control  problem,  in  fact,  of  a generalized  linear  regulator  type. 
Below,  in  Corollary  1 we  prescribe  an  explicit  scheme  for  the 
calculation  of  the  minimizing  V. 


t 


Define  g.  (t)  = 4>  (t,  £°  (t)  , V)  , h-.(t)  = $ (t , 5°  (t)  ,V)  , 

i xi  ij  xixj 

and  let  9 (t)  = (g1  (t)  , . . . ,gn (t) ) ' , h(t)  = (h^  (t)  , . . . ,hln ( t)  , . . . ,nnn  (t) ) 

Since  <P  ( t,x,V ) satisfies 
xi 

(■i  ) ( t , x , V)  + (t,x,V) f (t,x,U° (t) ) + <f  (t,x,V)f  (t /X,U°  (t)  ) 

t XXj_  X Xj_ 

(6) 

+ ( A V°)  ( t / x ) + { L (t  ,x  ,U°  (t)  ) + (t,x)B(t)  }V(t)  = 0 

js. 

with  <P  ( T / x / V ) = 0,  then  g (t)  satisfies 

xi  1 

dg.  ~ r,  o 

— = f^  (t,C°(t)  ,U°(t))g(t)  + (Ax¥°)x  (t,£ °(t)) 

(7) 

+ {Lux  (t,£°(t) ,U°(t) ) + ^xx. (t#C°(t))B(t) }V(t) 

with  g(T)  =0,  1 = l,...,n.  Similarly  0 (t,x,V)  satisfies 

1 xixj 


(*  v ) ( t ,x , V)  + <t>  v (t,x,V)  f (t,x,  U°  ( t ) ) + $ (t,x,V)  f (t  ,x,U°  (t)  ) 
xixj  t xxix.  x xixj 

(S)  + <P  ( t , x , V)  f ( t , x ,U°  ( t)  ) + 4>  (t,x,V)f  (t,x,U°  (t)  ) 

xxi  xj  xxj  xi 

+ (Ax4'°)x  X.  (t,x)  + {Lux.x.(t,x'U°(t))  + yxx.x.(t'x)B(t)  }V(t)  = 0 


with  boundary  condition  4>  (T,x,V)  = 0,  hence 

xixj 


% 


dlv^(t)  n 

k=rkj' " * 


n 


dt  = ? (t)fx  (t,C°(t)  ,U°(t))  + l hkifx  (t,$°(t)  ,U°(t)) 

i.  K= X j 


(9) 


+ y ' (t) f (t,C°(t) ,U°(t))  + (AY0)  (t,S°(t)) 

xixj  x xixj 


+ {L  (t,t°(t)  ,U°(t))  + Y°  v (t,£°(t)  )B(t)  }V(t) 

Uxixj  xxi  ] 


with  final  condition  (T)  = 0.  The  cost  function  becomes 


(10) 


r T II 

J4(V)  = ! I h .(t)  + g ' (t)  B ( t)  V ( t)  + 
J s0  i = 1 


\ V’  (t)Luu(t,C°(t)  , U°(t)  ) V (t)  dt . 


Thus  we  now  have  a deterministic  control  problem  with  state 
equations  (7),  (9)  with  control  function  V and  cost  function 
(10).  Time  now  runs  backwards,  that  is,  we  prescribe  h and  g 
at  the  final  time  T,  but  the  functions  g and  h are  un- 
specified at  time  s_.  The  quantities  *f/°  (t,£°(t))  and 

u xixi 

(t,  £j°  (t)  ) can  be  found  easily  using  the  method  of 

k 1 J 

characteristics  once  U°(t)  is  known.  One  simply  repeats  the 
procedure  on  V°  used  in  deriving  equations  (7)  and  (9) . 

We  now  formulate  a generalized  linear  regulator  problem 

for 


z ( t) 


def  n 


( 9 ! 


,fV 


lll' 


‘in' 


‘nl' 


•hnn) (T-t) 


(Of  course,  = h^  so  in  actual  numerical  computation  some 

of  the  terms  may  be  eliminated.)  The  equations  for  z can  be 


written  in  the  form 


(11)  = D1z  + D2w  + E, 

z(0)  = 0,  with  cost  function, 

T_S0 

(12)  J,.  (w)  = [ K^z  + zTRw  + wT(Jwdt 

3 J 0 

for  at uropriate  matrices  , K,  Q (R,  Q symmetric)  and 

vectors  E,  K,  and  control  function  w(t)  = V (T-t) • 

This  problem  can  be  solved  usiny  dynamic  programming.  Let 
d(t,z)  be  the  optimal  cost  corresponding  to  the  control  problem 
(11),  (12)  but  with  initial  condition  z(t)  = z instead  of 
z(0)  = 0.  Then  satisfies  the  Hamilton-Jacobi  equation 

4>t  + 9zDiz  + + KTz 

(13) 

+ min [ + z iR) w + — w^Qwj  = 0. 
w 

The  minimum  in  (13)  is  obtained  when 

(14)  w = -Q_1[0zD2  + zTRJT 
hence  (13)  can  be  written  as 


(13) 


(1G) 


$t  + + E)  + (KTz) 

1 T -1  T T 

- J(<()zU2  + z R)U  (<!>zD2  + z R) 

This  equation  has  the  solution 

0 ( t , z ) = j zTP(t)z  + rT(t)z  + q ( t ) 


0. 


i 


where 


i P'  + pV  -i-  RU_1RT  PTD0Q  1D^P 
*•  1 Z Z Z Z 

-R<J_1P  = 0, 


T T j 

(r')  + r E>1  + K 


rp  _ T rn  rp  n -]  m 

-r  D9U  D*P  - r D^Q  R1  = 0, 


f IT  —IT 

q'  + r E - i.  r D2Q2iD2r  = 0, 


and  with  initial  conditions 


q(0)  = 0,  r'1' ( 0 ) = 0,  P (0)  = c. 


Lsinq  (14)  one  obtains  tnat  the  optimal  feedback  control  is 


w ( t , z ) = 


— IT  T T T 

-U  x[b_r  + D.IP  z + Rxz] 


Therefore  we  have  the  following 


Corollary'  1.  The  function  V*  (t)  minimizing  T (V)  is  given  by 


V#(t)  = w (T-t , z°(T-t) ) 


where  z (t)  is  the  solution  to  (11)  with  w = w(t,z)  and 

z (0)  = 0. 


4.  Conclusions , 


Example  1 . Consider  scalar  equations 


dt(t)  = U ( t)  dt  + (2e)^dw(t), 


i 


f,  (0) 


0 , and  cost  function 


r 


li  [(C(t)2  + C(t))2  t S(t)2  + i-U(t)2]dt. 

J 0 

This  problem  is  actually  of  the  type  considered  in  [ 4 ] / but 
let  us  use  the  methods  of  the  paper  to  determine  the  optimal 
V*.  Since  U°  ^ o,  then  ^°(t,x)  = (1-t) (x^  + 2x2  + 2x2)  and 
the  deterministic  control  problem  for  V is  the  following, 
f 1 2 

Minimize  h..(t)  + y.(t)V(t)  + V (t)dt  with  state  equations 

J 0 11  1 

dg-,  (t) 

— 4^ — = 12 ( 1-t)  + 4 (1-t) V (t) , g1(l)  = 0, 


dni;L  (t) 
dt 


= 24 ( 1-t)  + 12 ( 1-t) V (t) , hlx ( 1)  = 0, 


over  the  class  of  open  loop  controls  V(t).  Rather  than  use 
the  procedure  of  Section  3,  we  use  Pontryagin's  maximum  principle 
to  determine  V.  V is  determined  from  the  equation 


V(t)  + g1(t)  + 4p1(t)(l-t)  + 12p2(t)(l-t)  = 0 
where  p^tt)  and  p.,  (t)  are  the  costate  variables  which  satisfy 


dp1 (t) 


dt 


-v(t) , Px(0)  = 0, 


and 


dp2 (t) 


dt  " -1'  “ °- 


A 


It  is  easily  verified  that  V(t)  = -3(l-(sech  2) cosh  2t) 
satisfies  the  above  equations.  Recall  that  Vc  = U°  + eV 
is  then  the  best  approximate  control.  The  costs  of  using  Vc 
and  U°  in  the  e-problem  for  various  e are  listed  in  Table  1. 


e 

Cost  Using  U° 

Cost  Using  Vc  - 
Cost  Using  U° 

0 

0 

0 

.04 

.0864 

-.00232 

.08 

.1856 

-.00714 

.12 

.2976 

-.011946 

. 16 

.4224 

-.01375 

.20 

.56 

-.00952 

.24 

.7104 

.00388 

.40 

1.44 

.21644 

.80 

4.16 

3.20844 

1.00 

6 

7.08288 

Table  1 


For  e = +.12  the  use  of  VL  realizes  an  approximately  4% 
decrease  in  cost  over  the  cost  of  using  U°.  However,  note  that 
as  e increases,  the  use  of  VL  realizes  more  cost  than  using 
U°  in  the  e-problem. 


Ml 


k. 


Remark . The  "best"  control  approximation  technique  is  admittedly 
complex.  In  partial  justification  for  such  a complex  scheme, 
let  us  show  that  a less  complicated  scheme  - an  accessory 
stochastic  control  problem  similar  to  that  for  the  deterministic 
control  problem  by  [ 7 J yields  a trivial  and  unusable  solution. 

Consider  linear  state  equations  of  the  form 

d£(t)  = A ( t)  C (t)  + B ( t)  U ( t)  dt  -r  ( 2e)  ^Idw , UsQ)  = xQ  , 

k o 

U ( t)  k R , with  cost  function  L . Define  x(t)  = t,  (t)  - £ (t)  , 

V(t)  = U ( t)  - U°(t),  then  x(t)  satisfies  the  equation 

1. 

dx  = A ( t ) x + B ( t) V ( t) dt  + (2e)  ;dw,  x(Sq)  = 0,  with  cost  function 
r T 

E L(t,C  (t)  ,U  (t)  ) - L(t,f,°(t)  ,U°(t))dt.  Since 
S0 
rT 

K’  L ( t , t ( t ) , U ( t ) ) ( ( t)  - ( t ) ) + L„  (t,C°(t)  ,U°(t)  ) (U(t)  - U°(t))dt  = 0, 

1 „ ^ u 

b0 

then  the  new  cost  function  can  be  approximated  by 


fT  ( 

1 L°  L°  \ 

XX  xu  ' 

( X ( t ) , V (t)  ) l 

( x ( t ) ,V(t)  ) 'dt 

SA  ' 

1,0  ,0 

0 

\L  L J 

' ux  uu  / 

where  the  0 indicates  evaluation  along  ( t , £° ( t) ,U° ( t) ) . If 
the  matrix  of  partial  derivatives  of  L is  positive  definite, 
then  this  approximate  control  problem  is  minimized  by  the  choice 
V ( t)  1 0 since  x(sQ)  = 0*  Thus  this  linearization  technique  to 
compute  a correction  factor  yields  zero  correction.  However, 
Example  1 was  of  the  above  type  and  a correction  term  yielded  a 
lower  cost  than  using  U°  for  sufficiently  small  e. 


1 


Remark. . The  approximation  technique  described  in  this  chapter 
can  also  be  used  if  B = B(t,x).  However,  the  equations  for 
g(t),  h(t)  are  complicated  slightly  by  the  addition  of  terms 
involving  the  x-partial  derivatives  of  B evaluated  along 
<t,C°(t) ) . 


Remark . The  original  work  on  the  problem  was  done  in  an  un- 
published part  of  the  author's  dissertation  [2  ].  Recently, 
we  have  discovered  the  convenient  solution  to  the  auxiliary 
minimization  problem  which,  was  lacking  in  [21,  and  which 
make  the  auxiliary  problem  tractable  for  large  scale  systems. 
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